Finding large and small dense subgraphs 



Reid Andersen 
February 1, 2008 

Abstract 

We consider two optimization problems related to finding dense 
subgraphs, which are induced subgraphs with high average degree. 
The densest at-least-fc-subgraph problem (DalkS) is to find an induced 
subgraph of highest average degree among all subgraphs with at least 
k vertices, and the densest at-most-fc-subgraph problem (DamkS) is 
defined similarly. These problems are related to the well-known dens- 
est /c-subgraph problem (DkS), which is to find the densest subgraph 
on exactly k vertices. Our main result is that DalkS can be approx- 
imated efficiently, while DamkS is nearly as hard to approximate as 
the densest /c-subgraph problem. We give two algorithms for DalkS, a 
3-approximation algorithm that runs in time 0(m + nlogn), and a 2- 
approximation algorithm that runs in polynomial time. In contrast, we 
show that if there exists a polynomial time approximation algorithm 
for DamkS with ratio 7, then there is a polynomial time approximation 
algorithm for DkS with ratio 4(7 2 + 7). 



1 Introduction 



The density of an induced subgraph is the total weight of its edges divided 
by the size of its vertex set, or half its average degree. The problem of finding 
the densest subgraph of a given graph, and various related problems, have 
been studied extensively. In the past decade, identifying subgraphs with 
high density has become an important task in the analysis of large networks 

[a no]. 

There are a variety of efficient algorithms for finding the densest sub- 
graph of a given graph. The densest subgraph can be identified in polynomial 
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time by solving a maximum flow problem [9] . Charikar [5 J gave a greedy 
algorithm that produces a 2-approximation of the densest subgraph in linear 
time. Kannan and Vinay [12] gave a spectral approximation algorithm for 
a related notion of density. Both of these approximation algorithms are fast 
enough to run on extremely large graphs. 

In contrast, no practical algorithms are known for finding the densest 
subgraph on exactly k vertices. If k is specified as part of the input, and is 
allowed to vary with the graph size n, the best polynomial time algorithm 
known has approximation ratio n s , where 5 is slightly less than 1/3. This 
algorithm is due to Feige, Peleg, and Korsarz [7J. The densest ^-subgraph 
problem is known to be A^P-complete, but there is a large gap between this 
approximation ratio and the strongest known hardness result. 

In many of the graphs we would like to analyze (for example, graphs 
arising from sponsored search auctions, or from links between blogs), the 
densest subgraph is extremely small relative to the size of the graph. When 
this is the case, we would like to find a subgraph that is both large and dense, 
without solving the seemingly intractable densest /c-subgraph problem. To 
address this concern, we introduce the densest at-least-fc-subgraph problem, 
which is to find the densest subgraph on at least k vertices. 

In this paper, we show that the densest at-least-£;-subgraph problem can 
be solved nearly as efficiently as the densest subgraph problem. In fact, 
we show it can be solved by a careful application of the same techniques. 
We give a greedy 3-approximation algorithm for DalkS that runs in time 
0(m + n log n) in a weighted graph, and time 0(m) in an unweighted graph. 
This algorithm is an extension of Charikar's algorithm for densest subgraph 
problem. We also give a 2-approximation algorithm for DalkS that runs in 
polynomial time, and can be computed by solving a single parametric flow 
problem. This is an extension of the algorithm of Gallo, Grigoriadis, and 
Tarjan [9] for the densest subgraph problem. 



We also show that finding a dense subgraph with at most k vertices is 
nearly as hard as finding the densest subgraph with exactly k vertices. In 
particular, we prove that a polynomial time 7- approximation algorithm for 
the densest at-most-/c-subgraph problem would imply a polynomial time 
4(7 2 + 7)-approximation algorithm for the densest fc-subgraph problem. 
More generally, if there exists a polynomial time algorithm that approxi- 
mates DamkS in a weak sense, returning a set of at most (5k vertices with 
density at least I/7 times the density of the densest subgraph on at most k 
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vertices, then there is a polynomial time approximation algorithm for DkS 
with ratio A(j 2 + 7/?). 

Our algorithms for DalkS can find subgraphs with nearly optimal density 
in extremely large graphs, while providing considerable control over the sizes 
of those subgraphs. Our reduction of DkS to DamkS gives additional insight 
into when DkS is hard, and suggests a possible approach for improving the 
approximation ratio for DkS. 

The paper is organized as follows. We first consider the DalkS problem, 
presenting the greedy 3-approximation in Section[3l and the polynomial time 
2-approximation in Section HI We consider the DamkS problem in Section 
In Section [6l we discuss the possibility of finding a good approximation 
algorithm for DamkS. 

1.1 Related work 

We will briefly survey a few results on the complexity of the densest k- 
subgraph problem. The best approximation algorithm known for the general 
problem (when k is specified as part of the input) is the algorithm of Feige, 
Peleg, and Kortsarz [7|, which has ratio 0(n 5 ) for some 5 < 1/3. For any 
particular value of k, the greedy algorithm of Asahiro et al. [4J gives the 
ratio 0(n/k). Algorithms based on linear programming and semidefinite 
programming have produced approximation ratios better than 0(n/k) for 
certain values of k, but have not improved the approximation ratio of n s for 
the general case [U [6] . 

Feige and Seltser [8] showed the densest /c-subgraph problem is MV- 
complete when restricted to bipartite graphs of maximum degree 3, by a 
reduction from max-clique. This reduction does not produce a hardness of 
approximation result for DkS. In fact, they showed that if a graph contains 
a fc-clique, a subgraph with k vertices and (1 — e)^) edges can be found 
in sub exponential time. Khot [13] proved there can be no PTAS for the 
densest /c-subgraph problem, under a standard complexity assumption. 

Arora, Karger, and Karpinski [2j gave a PTAS for the special case k = 
Q(n) and m = Q(n 2 ). Asahiro, Hassin, and Iwama [3j showed that the 
problem is still ATP-complete in very sparse graphs. 
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2 Definitions 



Let G = (V, E) be an undirected graph with a weight function w : E — > JR + 
which assigns a positive weight to each edge. The weighted degree w(v,G) 
is the sum of the weights of the edges incident with v. The total weight 
W(G) is the sum of the weights of the edges in G. 

Definition 1. For any induced subgraph H of G, we define the density of 
H to be 

W{H) 



d{H) 



\H\ 



Definition 2. For an undirected graph G, we define the following quantities. 

the maximum density of an induced subgraph on at least k vertices. 



dal(G,k) 
dam(G, k) 
dex(G, k) 
dmax{G) 



the maximum density of an induced subgraph on at most k vertices, 
the maximum density of an induced subgraph on exactly k vertices, 
the maximum density of any induced subgraph. 



The densest at-least-/c-subgraph problem (DalkS) is to find an induced 
subgraph on at least k vertices achieving density dal(G,k). Similarly, the 
densest at-most-/c-subgraph problem (DamkS) is to find an induced sub- 
graph on at most k vertices achieving density dam(G,k). The densest k- 
subgraph problem (DkS) is to find an induced subgraph on exactly k vertices 
achieving dex(G, k), and the densest subgraph problem is to find an induced 
subgraph of any size achieving dmax(G). 

We now define formally what it means to be an approximation algorithm 
for DalkS. Approximation algorithms for Damks, DkS, and the densest sub- 
graph problem, are defined similarly. 

Definition 3. An algorithm A(G, k) is a ^/-approximation algorithm for 
the densest at-least-k- subgraph problem if for any graph G and integer k, 
it returns an induced subgraph H on at least k vertices of G with density 
d{H) > dal(G,k)/~f. 



3 The densest at-least-/c-subgraph problem 

In this section, we give 3-approximation algorithm for the densest at-least-A;- 
subgraph problem that runs in time 0{m + n log n) in a weighted graph, and 
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time 0(m) in an unweighted graph. The algorithm is a simple extension of 
Charikar's greedy algorithm for the densest subgraph problem. To analyze 
the algorithm, we relate the density of a graph to the size of its ui-cores, 
which are subgraphs with minimum weighted degree at least w. 

ChALK(G, k) : 

Input: a graph G with n vertices, and an integer k. 
Output: an induced subgraph of G with at least k vertices. 

1. Let H n = G and repeat the following step for i = n, . . . , 1: 

(a) Let n be the minimum weighted degree of any vertex in Hi. 

(b) Let Vi be a vertex where w(vi,Hi) = n. 

(c) Remove Vi from Hi to form the induced subgraph Hi—\. 

2. Compute the density of d(Hi) for each i G [1, n]. 

3. Output the induced subgraph Hi maximizing maxj>/% d(Hj). 

Theorem 1. ChALK(G,k) is a 3- approximation algorithm for the densest 
at-least-k- subgraph problem. 

We will prove Theorem Q] in the following subsection. The implementa- 
tion of step 1 described by Charikar (see [5]) gives us the following bound 
on the running time of ChALK. 

Theorem 2 (Charikar). The running time ofChALK(G,k) is 0{m) in an 
unweighted graph, and 0(m + nlogn) in a weighted graph. 

3.1 Analysis of ChALK 

The ChALK algorithm is easy to understand if we consider the relationship 
between induced subgraphs of G with high average degree (dense subgraphs) 
and induced subgraphs of G with high minimum degree (w-cores). 

Definition 4. Given a graph G and a weight w £ R, the w-core C W {G) is 
the unique largest induced subgraph of G with minimum weighted degree at 
least w. 

Here is an outline of how we will proceed. We first prove that the ChALK 
algorithm computes all the w-coies of G (Lemma [1]) . We then prove that 
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for any induced subgraph H of G with density d, the (2d/3)-core of G has 
total weight at least W(H)/3 (Lemma [2]) . We will prove Theorem Q] using 
these two lemmas. 

Lemma 1. Let {H\, . . . ,H n }, {v\, . . . ,v n }, and {n, . . . ,r n } be the induced 
subgraphs, vertices, and weighted degrees determined by ChALK on the input 
graph G. For any w G R, if I(w) is the largest index such that r(vjr w \) > w, 
then Hj( w ) = C W (G). 



Proof. Fix a value of w. It easy to prove by induction that none of the 
vertices v n . . . vu w \+i that were removed before vu w \ is contained in any 
induced subgraph with minimum degree at least w. That implies C W (G) C 
Hi(w)- O n the other hand, the minimum degree of is at least w, so 

H I(u]) C C W {G). Therefore, H I{w) = C W {G). □ 

Lemma 2. For any graph G with total weight W and density d = W/\G\, 
the d-core of G is nonempty. Furthermore, for any a £ [0,1], the total 
weight of the (ad)-core of G is strictly greater than (1 — a)W. 



Proof. Let {-Hi, . . . , H n } be the induced subgraphs determined by ChALK 
on the input graph G. Fix a value of w, let I(w) be the largest index such 
that r(vj/ w \) > w, and recall that Hjr w \ = C W {G) by Lemma CD Since each 
edge in G is removed exactly once during the course of the algorithm, 

\G\ 

W = J^r(i) 

i=i 

I(w) \G\ 
i=l i=I(w)+l 

<W{H I{w) )+w(\G\-I(w)) 
<W{C w {G))+w\G\. 

Therefore, 

W{C W (G)) >W-w\G\. 

Taking w = d = W/\G\ in the equation above, we learn that W(Cd(G)) > 0. 
Taking w = ad = aW/\G\, we learn that W(C ad (G)) > (1 - a)W . 

□ 
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Proof of Theorem [TJ, Let {Hi, . . . , H n } be the induced subgraphs deter- 
mined by the ChALK algorithm on the input graph G. It suffices to show 
that for any k, there is an integer / € [k,n] satisfying d(Hj) > dal(G,k)/3. 

Let H* be an induced subgraph of G with at least k vertices and with 
density (f* = W{H*)/\H*\ = dal(G, k). We may apply Lemma[2]to H* with 
a = 2/3 to show that C(2d t /3)(H*) has total weight at least W(H*)/3. This 
implies that C(2d,/3){G) has total weight at least W{H*)/3. 

The core C^d t ,/3)(G) has density at least d*/3, because its minimum 
degree is at least 2d*/3. Lemma [TJ shows that C^d^/z) (G) = Hj, for I = 
\G(2d*/3)(G)\. If I > k, then Hi satisfies the requirements of the theorem. 
If / < k, then C^d t /3) (G) = Hi is contained in H^, and the following 
calculation shows that H^ satisfies the requirements of the theorem. 

d(Ht) _ ^1 > m^Ml > EiMH = 4/3 . 

k k k 

□ 

Remark 1. Charikar proved that ChALK(G,l) is a 2- approximation algo- 
rithm for the densest subgraph problem. This can be derived from the fact 
that if w = dmax(G), the w-core of G is nonempty. 

4 A 2-approximation algorithm for the densest at- 
least-£>subgraph problem 

In this section, we will give a polynomial time 2-approximation algorithm 
for the densest at-least-A; subgraph problem. The algorithm is based on 
the parametric flow algorithm of Gallo, Grigoriadis, and Tarjan [9]. It is 
well-known that the densest subgraph problem can be solved using similar 
techniques; Goldberg pXJ showed that the densest subgraph can be found 
in polynomial time by solving a sequence of maximum flow problems, and 
Gallo, Grigoriadis, and Tarjan described how to find the densest subgraph 
using their parametric flow algorithm. 

It is natural to ask whether there is a polynomial time algorithm for the 
densest at-least-/j-subgraph problem. We do not know of such an algorithm, 
nor have we proved that DalkS is ./VP-complete. 

Theorem 3. There is a polynomial time 2-approximation algorithm for the 
densest at-least-k- subgraph problem. 
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Proof. The parametric flow algorithm of Gallo, Grigoriadis, and Tarjan can 
compute in polynomial time a collection TL of nested induced subgraphs of 
G such that for any value of a, the following expression is maximized by one 
of the subgraphs in TL. 

max\H\(d(H)-a). (1) 

Let TL' be the modified collection of subgraphs obtained by padding each 
subgraph in TL with arbitrary vertices until its size is at least k. We will 
show that there is a set H E TL' that satisfies d{H) > dal(G,k)/2. Thus, a 
polynomial time 2-approximation algorithm for DalkS can be obtained by 
computing TL, padding some of the sets with arbitrary vertices to form TL' , 
and returning the densest set in TL'. The running time is dominated by the 
parametric flow algorithm. 

Let -ff* be an induced subgraph of G with at least k vertices that has 
density d(H*) = dal(G,k). Let a = dal(G,k)/2, and let H be the set from 
TL that maximizes (pQ) for this value of a. In particular, 

\H\(d(H) - a) > \H*\{d(H*) - a) > \H*\d(H*)/2. (2) 

This implies that H satisfies d(H) > a = dal(G,k)/2. If \H\ > k, then we 
are done. If \H\ < k, then consider the set H' of size exactly k obtained by 
padding H with arbitrary vertices. We will show that d(H') > dal(G, k)/2, 
which will complete the proof. First, notice that ([2]) implies a lower bound 
on the size of H. 

lrrl^lzrl d ( g *) _ ilT ^al(G,k) 
11 " 1 * l 2d(H) ~ 1 *' 2d(H) ' 

We can then bound the density of the padded set H' . 



d(H') > d(H) (W} 



> 



> 



Mu J\H*\ dal(G,k) 

d{H) yir^djHy 

dal(G,k) \H,\ 

2 k 
dal{G,k) 



□ 
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5 The densest at-most-/c-subgraph problem 



In this section, we show that the densest at-most-fc-subgraph problem is 
nearly as hard to approximate as the densest /c-subgraph problem. We will 
show that if there exists a polynomial time algorithm that approximates 
DamkS in a weak sense, returning a set of at most (3k vertices with density 
at least I/7 times the density of the densest subgraph on at most k ver- 
tices, then there exists a polynomial time approximation algorithm for DkS 
with ratio 4(7 2 + 7/3). As an immediate consequence, a polynomial time 7- 
approximation algorithm for the densest at-most-&:-subgraph problem would 
imply a polynomial time 4(7 2 + 7)-approximation algorithm for the densest 
fc-subgraph problem. 

Definition 5. An algorithm A(G, k) is a 7)-algorithm for the densest 
at-most-k- subgraph problem if for any input graph G and integer k, it re- 
turns an induced subgraph of G with at most (3k vertices and density at least 
dam(G, k)/j. 

Theorem 4. // there is a polynomial time (/?, 7) -algorithm for the densest 
at-most-k-subgraph problem (where (3 and 7 are at least 1), then there is 
a polynomial time 4(7 2 + 7/?) -approximation algorithm for the densest k- 
subgraph problem. 

Proof. Assume there exists a polynomial time algorithm A(G, k) that is 
(/3, 7)-algorithm for DamkS. We will now describe a polynomial time ap- 
proximation algorithm for DkS with ratio 4(7 2 + 7/?). 

Given as input a graph G and integer k, let H\ = G, let i = 1, and repeat 
the following procedure. Let Hi = A(Gi,k) be an induced subgraph of Gi 
with at most (3k vertices and with density at least dam(Gi,k)/j. Remove 
all the edges in Hi from Gi to form a new graph Gj+i on the same vertex 
set as G. Repeat this procedure until all edges have been removed from G. 

Let Hi be the number of vertices in Hi, let Wi = W(H,(), and let di = 
d(Hi) = Wi/ni. Let H* be an induced subgraph of G with exactly k vertices 

and density d* = dex(G,k). Notice that if {W\ H h W t -i) < W(H*)/2, 

then dt > d^/2^. This is because dt is at least I/7 times the density of the 
induced subgraph of Gt on the vertex set of H*, which is at least 

W(H*) - (Wi -\ h W t -i) > W(H,) = 4 

k ~ 2k 2 ' 
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Let T be the smallest integer such that (WiH \-W T ) > W(H*)/2, and 

let Ut be the induced subgraph on the union of the vertex sets of H%, . . . , Ht- 
The total weight W(U T ) is at least W(H*)/2. The density of U T is 

a{UT) = , Tr , > > mm — > — . 

|t/r| ni + ■ ■ ■ + ut i<t<T m 27 

To bound the number of vertices in Ut, notice that (n\ + • • • + rix-i) < jk, 
because 

i=l i=l ' i=l 

Since is at most /3k, we have \Ut\ < (^1 + • • • + "-t) < (7 + 

There are now two cases to consider. If \Ut\ < A;, we add vertices to Ut 
arbitrarily to form a set U' T of size exactly k. The set U' T is more than dense 
enough to prove the theorem, 

dm) > S = 4 

If |J7t| > k, then we employ a simple greedy procedure to reduce the number 
of vertices. We begin with the induced subgraph Ut, greedily remove the 
vertex with smallest degree to obtain a smaller subgraph, and repeat until 
exactly k vertices remain. The resulting subgraph U T has density at least 
d(UT)(k/2\UT\) by the method of conditional expectations (see also [7]). 
The set U T is sufficiently dense, 

k ( d± \ ( J\i \ d* 



«uf> > > (£) (^) 



2( 7 + /3)/cy 4( 7 2 +7/3)' 



□ 



Remark 2. The argument from Theorem^ proves a slightly more general 
statement: if there is a polynomial time algorithm for DamkS that is a 7)- 
algorithm for certain values of k, then there is a polynomial time algorithm 
for DkS that is a 4(7 2 + 7 (3) -approximation algorithm for those same values 
ofk. 

We remark that the densest at-most-A>subgraph is easily seen to be MV- 
complete, since a subgraph of size at most k has density at least (k — l)/2 
if and only if it is a /c-clique. As mentioned previously, Feige and Seltser [8] 
proved that the densest fc-subgraph problem remains A/'T'-complete when 
restricted to graphs with maximum degree 3, and their proof shows that the 
same statement is true for the densest at-most-/c-subgraph problem. 
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6 Conclusion 



In this section, we discuss the possibility of improving the approximation 
ratio for DkS via an approximation algorithm for DamkS. One possible 
approach is to develop a local algorithm for DamkS, analogous to the recently 
developed local algorithms for graph partitioning [151 IT]. For any partition 
separating k vertices, these algorithms can produce a partition separating 
0(k) vertices that is nearly as good (in terms of conductance). 

We conjecture that there is a local algorithm for the densest subgraph 
problem that finds a subgraph of density at least 9/ log n on at most 0(k 1+s ) 
vertices, whenever there exists a subgraph of density 6 on k vertices. This 
would be a (log n, /^-approximation algorithm for DamkS, which would 
lead to an approximation algorithm for the densest /c-subgraph problem 
with ratio 0(k s log 2 n). An algorithm with 5 = 1 would not be helpful for 
approximating DkS, since an approximation ratio of 0(k) can be obtained 
trivially. At the other extreme, an algorithm with 5 = would produce an 
0(log 2 n) approximation algorithm for DkS, which seems unlikely. 
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